I used Amazon Translate to translate multiple source language documents into numerous destination languages.
To connect with a worldwide audience of consumers, clients, and investors, businesses must translate business-critical information such as promotional materials, guidebooks, and online ordering into different languages. Determining the source language in each document before calling a translation task poses challenges.
Overview
The automated language recognition capability for batch translation tasks in Amazon Translate now allows you to translate a batch of documents in many languages with a single translation job. This eliminates the requirement for you to organize the document translation procedure, which required the detection and classification of dominant languages. Amazon Translate also supports translation to several target languages (up to 10 languages).
Automated source language detection for batch translation jobs enables you to translate documents written in many supported languages in a single operation. You can also specify up to ten different languages as targets. Amazon Translate determines the prevailing language in each of your source documents using Amazon Comprehend and utilizes it as the source language.
Create a batch translation job via the console
In this blog, we will use batch translation to automatically identify the source language and translate it into multiple languages (Japanese and Spanish). The location of the input and output will be the Amazon S3.
NOTE: Batch translation is supported in the following AWS Regions
- US East (N. Virginia)
- US East (Ohio)
- US West (Oregon)
- Asia Pacific (Seoul)
- Europe (Frankfurt)
- Europe (Ireland)
- Europe (London)
You may decide to choose the output it should be a formal tone or informal, also profanity masking for profane words or phrases can be supported.
After the translation job is completed, check out the output S3 bucket location to confirm the translation job to their target language respectively.
The input consists of two files in two distinct languages, so the output document is expected to be four, each with two dominant language documents translated into two target languages.
Create a batch translation job via the AWS SDK
import boto3 client = boto3.client('translate') def lambda_handler(event, context): response = client.start_text_translation_job( JobName='Translation-job', InputDataConfig={ 'S3Uri': 's3://<<REPLACE-WITH-YOUR-INPUT-BUCKET>>/input', 'ContentType': 'text/plain' }, OutputDataConfig={ 'S3Uri': 's3://<<REPLACE-WITH-YOUR-OUTPUT-BUCKET>>/output' }, DataAccessRoleArn='<<REPLACE-WITH-THE-IAM-ROLE-ARN>>', SourceLanguageCode='auto', TargetLanguageCodes=[ 'ja', 'es' ] )